Bayesian estimation of performance measures of cervical cancer screening tests in the presence of covariates and absence of a gold standard.

In this paper we develop a Bayesian analysis to estimate the disease prevalence, the sensitivity and specificity of three cervical cancer screening tests (cervical cytology, visual inspection with acetic acid and Hybrid Capture II) in the presence of a covariate and in the absence of a gold standard. We use Metropolis-Hastings algorithm to obtain the posterior summaries of interest. The estimated prevalence of cervical lesions was 6.4% (a 95% credible interval [95% CI] was 3.9, 9.3). The sensitivity of cervical cytology (with a result of >or= ASC-US) was 53.6% (95% CI: 42.1, 65.0) compared with 52.9% (95% CI: 43.5, 62.5) for visual inspection with acetic acid and 90.3% (95% CI: 76.2, 98.7) for Hybrid Capture II (with result of >1 relative light units). The specificity of cervical cytology was 97.0% (95% CI: 95.5, 98.4) and the specificities for visual inspection with acetic acid and Hybrid Capture II were 93.0% (95% CI: 91.0, 94.7) and 88.7% (95% CI: 85.9, 91.4), respectively. The Bayesian model with covariates suggests that the sensitivity and the specificity of the visual inspection with acetic acid tend to increase as the age of the women increases. The Bayesian method proposed here is an useful alternative to estimate measures of performance of diagnostic tests in the presence of covariates and when a gold standard is not available. An advantage of the method is the fact that the number of parameters to be estimated is not limited by the number of observations, as it happens with several frequentist approaches. However, it is important to point out that the Bayesian analysis requires informative priors in order for the parameters to be identifiable. The method can be easily extended for the analysis of other medical data sets.


Introduction
The sensitivity (S e ) and the specifi city (S p ) are the two most common measures of the performance of a diagnostic test, where S e is the probability of a diseased individual to be correctly identifi ed by the test while S p is the probability of an individual without the disease (or condition) of interest to be correctly identifi ed by the same test. When the outcomes of the diagnostic test are represented in a continuous scale, a cut-off value should be chosen in order to determine when an individual is classifi ed as positive or negative. Generally, individuals with test outcome larger or at least equal to this fi xed cut-off are classifi ed as positive while individuals with test outcomes inferior to this fi xed cut-off are classifi ed as negative.
Although the real disease status of the individual could be verifi ed by a procedure generically denominated gold standard, it is common to fi nd situations where a proportion of the sampled individuals cannot be verifi ed on their real disease status. The problem can occur especially when the gold standard is an invasive and/or risky procedure and the defi nitive verifi cation for apparently healthy individuals is thus neither practical nor ethical. In order to overcome this problem, many studies on the evaluation of the diagnostic test are carried out by considering only verifi ed individuals. However, this approach can lead to measures that are usually biased, leading to studies denominated verifi cation bias or workup bias.
Cancer Informatics 2008: 6 33-46 Unbiased estimators for S e and S p are introduced by Begg and Greenes [1] and Zhou [2].
Another problem appears when all individuals can not be verifi ed by a gold standard. This occurs when there is not a defi nitive test for detection of the disease or the verifi cation by a gold standard is an impracticable procedure according to its cost, accessibility or risks. In this situation, maximum likelihood estimators are proposed by Hui and Walter [3]. However, these estimators are reasonable only in situations where the number of observations is larger or equal to the number of parameters, which is not our case, as we will see later. Free of this limitation, a Bayesian approach was introduced by Joseph et al. [4]. However, the method of Joseph et al. [4] do not consider the presence of covariates, which are very common on data from diagnostic test studies.
The objective of the present study is to verify the performance measures of cervical cytology, Hybrid Capture II (HC II) and visual inspection with acetic acid (VIA) in the detection of cervical precursor lesions, using a Bayesian statistical method that allows for the estimation of these measures, although part of the sampled women was not verifi ed by a gold standard. We also consider the presence of covariates in our study. Since Bayesian methods are based on incorporation of historical information and expert opinion into the modelling strategy (the called prior information), these elements could be too subjective, with source for other bias. In other words, inadequate prior information can imply in a biased estimator. However, a careful verification of the prior information and a subsequent analysis of its changes in the outcomes can result in reasonable estimates for the tests performance measures.
Thus, the new methodological contribution of the present paper is an extension of the Bayesian method proposed by Joseph et al. [4] for estimating the performance measures of screening tests introducing a vector of covariates. The rest of the paper is organized as follows. In Section 2, we discuss the defi nition of a gold standard in accuracy studies of cervical cancer screening tests. In Section 3, there is a description of the method of Joseph et al. [4] for estimating S e and S p related to two diagnostic tests in the absence of a gold standard. We also introduce in this section the notation used in the paper. In the following, we introduce the methodology for estimating S e and S p in the presence of a covariate. The cervical cancer screening data set is described in Section 4. The application of the proposed methodology on the analysis of the data set is presented in Section 5. Concluding remarks are given in Section 6.

Accuracy of Cervical Cancer Screening Tests
South and Central America have some of the highest incidence rates for cervical carcinoma in the world, ranging from 30/100,000 women to 40/100,000 women, or three to four times the incidence in developed countries [5]. In Brazil, crude estimates of incidence and mortality are given by 19,82/100,000 and 4,49/100,000 women, respectively [6]. Thus, it is strongly justifi ed to analyze the accuracy of different diagnostic tools for cervical carcinoma and their effi cacy in screening programs.
In assessing the accuracy of cervical cancer screening tests, it is not straightforward to defi ne an ideal gold standard. In many studies, the gold standard for evaluating the accuracy of screening tests in detecting true positive lesions is histopathology. If biopsies are not obtained, colposcopy is accepted as the fi nal diagnosis. However, colposcopy can give many false negative results when used to discriminate between normal and abnormal tissues (see, for example, Mitchell et al. 1998 andHopman et al. 1998) [7,8].
The reference test used in these studies, defi ned by the results of histology or colposcopy, is thus subject to errors and its estimates for sensitivity and specifi city can be biased. Another type of bias is evident when only a part of the sampled individuals will have their real disease status confi rmed by the biopsy and the remainders are not included in the calculations of the sensitivity and specifi city. This occurs principally when only the women with positive result for one or more diagnostic tests (or with positive clinical signals) are submitted to the gold standard and this selection results in an overestimated sensitivity and an underestimated specifi city [9].
Despite of the appearance of new methods developed to estimate the sensitivity and specifi city of screening tests without a gold standard [10] or in the presence of the verifi cation bias [11], many studies on the accuracy of cervical cancer screening tools present biased results due to the limitations of the proposed gold standard. For example, in a recent meta-analysis of the studies on performance of conventional cervical cytology, McCrory et al. (1999) evaluated 939 studies, where 84 took care of the standards established by the authors to guarantee the quality of the results. Of these 84 studies, only three did not have their results affected by the verifi cation bias [12].
Many studies are introduced in the literature in the absence of a gold standard. For instance, Hui and Walter (1980) derived equations that compute estimates and standard errors of sensitivity, specifi city and prevalence, without considering a reference test [3]. Joseph et al. (1995) introduced a Bayesian model using latent variables [4], and Dendukuri and Joseph (2001) extended this method to account for conditional dependence between the diagnostic tests [13]. Other important statistical contributions were provided by Faraone and Tsuang (1994) [14], Qu et al. (1996) [15] and Hadgu and Qu (1998) [16].

The Bayesian Framework
Considering k diagnostic tests, let T m = 1 if the result of test m is positive and T m = 0 if the result of test m is negative, for m = 1, ..., k. Let S e m and S p m be the sensitivity and the specifi city of the test m, respectively and let g be an observation of a binary latent variable G, introduced in the model aiming to simulate a non-observable gold standard [17]. Denoting the set of the observations and this latent variable for the i-th individual by The latent variable G, following the Bayes equation, has a Bernoulli distribution, that is, Considering beta prior densities Beta(α θ , β θ ) for all parameters in θ, where α θ and β θ generically denotes fi xed hyperparameters and combining the likelihood function for θ (1) with the prior densities, we use the Gibbs sampling algorithm [18,19] to simulate samples for the posterior distribution for θ. These samples are simulated from the full conditional posterior distributions for p, S e m and S p m . Following Equations (1) and (2) and considering k diagnostic tests, the conditional posterior distributions for the components of θ needed for the Gibbs sampling algorithm are given by where l = 1, 2, 3, W 0 i = 1, θ 1i = S ei , θ 2i = S pi , θ 3i = p i , for i = 1, ..., n. In this way, we have a vector of parameters given by β = (β 1 , β 2 , β 3 ), where β l = (β l0 , β l1 , ..., β lL ), l = 1, 2, 3. Assuming prior independence among the parameters, we consider the prior densities for β l j with normal distribution with fi xed hyperparameters a lj (means) and b lj 2 (variances), l = 1, 2, 3, j = 0, 1, ..., L. The likelihood function for β is given by sample. In each cycle of the algorithm is generated a new value for the latent variable G as (2).
In studies of the performance of two or more independent diagnostic tests applied to a selected group of individuals, where none of these tests can be considered the gold standard, a straightforward extension of this model can be used. Considering the three diagnostic tests, cervical cytology, VIA and HC II, where g is an observation of the latent variable G, given by (2). Combining the prior distributions with L(β ), we have the conditional posterior distributions for β given by the vector of unknown parameters is now given by β = (β 1 , ..., β 7 ), where β l = (β l0 , β l1 , ..., β lL ), l = 1, ..., 7, are vectors of parameters related to the sensitivity and the specifi city of each test and the prevalence of where j = 0, 1, ..., L and β (β 10 ) is the vector of all parameters except β 10 (for example). Observe that we should simulate samples for all parameters considering the Metropolis-Hastings algorithm [47] since their conditional distributions are diffi cult to cervical lesions. Let T m i be a random variable with observation t m i related to test m, m = 1, 2, 3. Using logit link function to relate the vector W i of L covariates to the screening performance measures, i = 1, ...n, the likelihood function for β is now given by In this expression, the vectors of parameters β 1 , β 2 and β 3 are related to the sensitivities of the cervical cytology, VIA and HC II, respectively; β 4 , β 5 and β 6 are related to the specifi cities of the cervical cytology, VIA and HC II, respectively; and the vector β 7 is related to the prevalence of cervical lesions. We consider the prior densities for β lj with normal distribution with fi xed hyperparameters a lj (means) and b lj 2 (variances), l = 1, .., 7, j = 0, 1, ..., L. Combining the prior distributions with L(β), we have the conditional posterior distributions for β and the Metropolis-Hastings algorithm is used to generate samples from the each parameter.

Data Set
The data set is from a European Commission funded ongoing study known as the LAMS (Latin American Screening) study, where PAP smear/ liquid-based cytology and screening colposcopy were compared with three optional screening tools (visual inspection with acetic acid or Lugol's iodine and cervicography) and with Hybrid Capture II from conventional samples and from self-samples, in women at different risk for cervical cancer in three Brazilian arms (São Paulo, Campinas and Porto Alegre) and one Argentine arm (Buenos Aires). The study design and baseline data of the LAMS study were presented by Syrjanen et al. (2005) [20]. Partial results from the LAMS study were provided by Sarian et al. (2005) [21].
In the present study, we considered the data from Campinas, one of the three Brazilian arms of the LAMS study. From February to December 2002, 1,195 women were recruited at a basic health unit and from July to December 2002, 221 women were recruited at the University Hospital (Centro de Atenção Integral à Saúde da Mulher-CAISM). Both services are situated in Campinas, a 969,396 inhabitants city in Brazil's southeast region. Among these 1.416 women, 809 women were eligible for the study related to the sensitivity and specifi city of three cervical cancer screening tests (cervical cytology, visual inspection with acetic acid and Hybrid Capture II) in the presence of covariates and in the absence of a gold standard and were willing to participate. Women were eligible if they were between 18 and 60 years of age, if they had been submitted at all three diagnostic methods and if they had intact uterus. Patients previously subjected to treatment for condylomas or with history of current abnormal cytology were excluded. Women who presented with confi rmed immunossupression, immunodefi ciency or HIV infection, who had sexual intercourse or vaginal medication in the last three days were not included. Informed consent was obtained from all participant women. The study protocol was reviewed and approved by the Committee of Ethics in Research of the Medical Science School of the State University of Campinas.
Cervical cytology was collected after evaluation and treatment for possible infectious processes. Ayre spatulas and cervical brushes were used for these samplings. The samples were stained according to the Papanicolaou method and evaluated using the Bethesda System [21]. Cytology was considered positive if showing cellular atypia, irrespective of their severity. Ecto-and endocervical samples were collected for second generation Hybrid Capture (HC II) using sterile endo-cervical brushes supplied by Digene Diagnostics and processed following the instructions of the manufacturer (Digene Diagnostics Inc.). The HC-II is a molecular biological method that tests the presence of the HPV-DNA, through a chemoluminescent reaction. HC-II is commercialized as standard kits and it is based on a reaction of hybridization realized into several sorts of solutions with non-radioactive probes of known ribonucleic acids. Viral load was measured in relative light units (RLU/CO) and HC II results were categorized as negative if Ͻ1 RLU/CO and positive otherwise. After the collection of the cervical cytology and HC II, dilute 5 percent acetic acid was applied to the cervix. One minute afterwards, the cervix was illuminated with adapted spotlights (100 Watts) and naked-eye examined for acetowhite areas. The visual appearance was classifi ed according to the Atlas for Unaided Visual Inspection of the Cervix [22] using the categories: normal, atypical, intraepithelial neoplasia or suggestive of cervical cancer. Normal or atypical results were classifi ed as negative and intra-epithelial neoplasia or suggestive of cervical cancer were classifi ed as positive. More details on the study protocol may be found in Syrjanen et al. (2005) [20].

Results
First of all, the sensitivity and specifi city of cervical cytology (T 1 ), VIA (T 2 ) and HC II (T 3 ) were estimated by a Bayesian approach proposed by Joseph et al. (1995) [4]. This method was developed for the situation where a reference test is not available and it has the assumption that the tests are conditionally independents. Seven parameters were estimated, including the prevalence of preneoplasic or neoplasic lesions and the sensitivity and specifi city pairs relative to the three diagnostic methods under evaluation. An important feature of the Bayesian approach is the combination of the data obtained by the current sampling scheme with prior information about the parameters of interest. This prior information is quantitatively introduced in the statistical analysis and it can represent the pooled subjective opinions of the experts, or information derived from the published literature. In the present study, we initially defi ned the prior information from the medical literature, using beta probability distributions.
The prior information about the sensitivity and the specifi city of cervical cytology was based on the systematic review of Nanda et al. (2000), who presented sensitivity for atypical squamous cells of undetermined signifi cance (ASC-US) or worse being ranged from 29 percent to 56 percent and specifi city from 97 percent from 100 percent [23]. The studies of Belinson et al. (2001) and of the University of Zimbabwe and JHPIEGO Cervical Cancer Project (1999) were used as references for the choice of the prior information about the sensitivity and the specifi city for the VIA [25][26]. In these studies, the sensitivity of VIA for at least CIN II was estimated in 55 percent and 64 percent, respectively and the specifi city was estimated as 76 percent and 67 percent, respectively. The prior information of the accuracy measures of HC II test was based on the studies of  and Wright et al. (2000), who estimated sensitivities by 88.4 percent and 81.3 percent (at 1 RLU cut-off), respectively and specifi cities by 89.0 percent and 84.5 percent, respectively [27][28].
However, the choice of informative prior distributions based only in a summary of previous studies can be a complex task, since each study has elements of subjectivity, error-proneness and possible potential for bias. Thus, a panel of experts on cervical cancer was asked to provide their best estimate for the sensitivities and specifi cities of the tests and the prior distributions that summarise the information provided by the literature review corrected by the experts were derived. The assessment of beta distribution priors for each test parameter considered the method presented by Joseph et al. (1995) [4], where the hyperparameters are defi ned by matching the center of a range of plausible values of sensitivity and sensitivity with the mean of the beta distribution and matching the standard deviation of the beta distribution with one quarter of the total range. We considered a vague prior distribution for the prevalence of precursor cervical lesions (a Beta distribution with hyperparameters 0.5 and 0.5, see [29]) motivated by a little background knowledge about this parameter.
The median age of the 809 women who participated of the study was 34 years. Approximately three quarters of these women lived with a partner (73.0 percent) and one-third had 8 or more years of education (33.3 percent). The majority self-reported to be white (67.2 percent) and 64.3 percent reported not to be a smoker. Half of the women (50.3 percent) reported to have had only one lifetime sexual partner and almost three quarters (72.9 percent) had initiated the sexual life in the teenage. Only 1.5 percent of the women entered in the study with less than one year since her fi rst sexual intercourse and the majority (86.1 percent) reported to have had only one sexual partner during the last 12 months. The percentage of women who are pregnant at the time of the study was around 7.7 percent.
Based on the cases for which cervical cytology was available, 758 (93.7 percent) had normal results, 12 (1.5 percent) had low-grade squamous intraepithelial lesion (LSIL), 4 (0.5 percent) had high-grade squamous intraepithelial lesion (HSIL) and 35 (4.3 percent) were ASC-US. Table 1 shows the results of the three tests for the 809 available cases. For a Bayesian data analysis, the Gibbs sampler algorithm was run for 100,000 cycles, where the fi rst 20,000 were used to assess convergence and the last 80,000 were used for inferences. For each parameter of interest, the arithmetic mean of these 80,000 Gibbs samples is a natural Bayesian estimator. These arithmetic means are showed in Table 2, with the respective 95 percent credible intervals. Table 2 also shows positive and negative predictive values for each diagnostic test, calculated in each cycle of the Gibbs algorithm from the estimated sensitivities and specifi cities and the prevalence fi gures (a mathematical approach is presented by Altman and Bland [30]).
The results suggest a low sensitivity for cervical cytology to detect ASC-US or worse (53.6 percent) as well as for VIA (52.9 percent), but indicate a high sensitivity for the HC II (90.3 percent). All screening methods presented relatively high specifi cities, 97.0 percent for cervical cytology, 93.0 percent for VIA and 88.7 percent for the HC II. As is evident in Table 2, all methods presented very low positive predictive values (PPV) due to the low prevalence of cervical lesions [30]. Although with high sensitivity and specifi city, HC II did not present a high PPV (35.3 percent), which is similar to that estimated for VIA. On the other hand, all methods presented high negative predictive values (NPV) ( Table 2). The prevalence of precursor lesion was estimated as 6.4 percent and this low prevalence naturally imply few diseased individuals and consequently low PPVs.
Another important result from the Bayesian model is related to the estimates for the expected value of true positives for each combination of the three screening methods, as shown in Table 3. This estimator is a numerical approximation and in this way it may accept decimals. The proportion of the predicted number of positives from the total of sampled individuals is the estimate of PPV. In Table 3, we notice that for 9 women, all three tests reported positive outcomes and the expected PPV is 98.6 percent. On the other hand, when all the tests are negative, it is expected that only 0.2 percent of the women with this outcome will have cervical lesions.
Tables 4 and 5 shows as secondary results the sensitivities and specifi cities for each combination of the screening tests. Table 4 summarizes the results when the tests are evaluated in serial combination (positive when both tests were positive and negative otherwise) and Table 5 summarizes the results when the tests are evaluated in parallel combination (positive when at least one of the tests was positive and negative otherwise). When the association between two tests is considered, the results of the third test are not considered. In Table 4, the serial combination between cervical cytology and visual inspection has a sensitivity of 78.0 percent, which is higher than the sensitivity of each test individually (53.6 and 52.9 percent respectively, see Table 2). This result suggests an apparent improvement in sensitivity, but at a cost  in specifi city, when these two tests are used jointly. However, this increase in sensitivity must be interpreted with caution. Franco [31] advised that nominal increase in sensitivity always occurs by chance whenever an adjunct test, as HC II, is used in combination with a conventional test, as Pap cytology, even if the complementary test was totally random with respect to the disease being evaluated.
In a second instance, we introduced in the model the age of the women (X ) as a continuous covariate. The covariate W 1 is given by ( ) X x − /10, where x is the sample mean of X . The quotient 10 is only considered for avoiding numerical instability related to large values in exponential functions present in the conditional posterior densities of interest. We also introduced in the model the variable W 2 , a dichotomous variable that denotes whether or not the woman is actually pregnant (1 if pregnant and zero otherwise). Firstly, we considered 13 the interaction between W 1 and W 2 in the model. However, all interaction parameters were estimated to be close to zero (ranged from −0.031 to 0.009) and it were excluded from the fi nal model.
From the conditional densities for the parameters in β, we generated 100,000 Gibbs samples. From this chain, we discarded the fi rst 20,000 (regarded as burn-in samples). The convergence of the Gibbs samples was monitored by standard existing methods [32] and the trace plots obtained are shown in Figure 1. The convergence was observed for all parameters. Prior distributions for the intercept parameters β 10 to β 70 were assumed with fi xed hyperparameters based in estimates obtained in the previous analysis without covariates. For example, we noted that the estimated sensitivity of the cervical cytology was 0.536 (see Table 2), and considering the inverse of the logit function, the hyperparameter a 10 is thus given by log (0.536/(1 -0.536)). All the other hyperparameter values were chosen to have noninformative priors. Thus, we used an empirical Bayesian modelling approach [33]. For each parameter, we considered every 50th draw, which totalizes a sample of size 1, 600. Considering  that a logit link was used, the regression coeffcients in β are interpreted as being the logarithm of the odds ratios (OR). These odds ratios represent an association measure between the variables W 1 and W 2 and the operating characteristics of the screening tests.
In Table 6, we have the posterior summaries for the exponential funcion of the parameters of interest in β, interpreted as odds ratios. We observe that the 95 percent credible intervals for the parameters e β 11 to e β 71 included the value 1, suggesting that there is no evidence for the effect of pregnancy in e β 12 and e β 52 S e and S p measures for all tests. The parameters e β 12 and e β 52 were estimated in 2.033 and 1.615, respectively and its credibility interval does not include the value 1. This result suggests that the sensitivity and the specificity of VIA increases as the age of the women increases. In fact, in the medical literature several authors have described that methods for detection of precursor lesions of cervical cancer have different performances according to the age of women [26,34,35]. The prevalence of cervical lesions, as expected, tends to increase as the age of the women increases (OR estimated in 0.483 and the respective 95% credible interval do not included the value 1). This effect of age on the prevalence is well-known in the medical literature since the disease is more incident in sexually active women.

Concluding Remarks
In this article, we introduced a Bayesian approach based on a Markov chain Monte Carlo (MCMC) algorithm that allows the performance measures estimations of diagnostic tests in the presence of covariates and when a gold standard is not available. An advantage of the proposed methodology is the fact that the number of parameters to be estimated is not limited by the number of observations as it happens when we use the method introduced by Hui and Walter [3]. We used the logit link function to relate the covariates linearly to the screening performance measures, but it is possible to use other link functions than the logit function, in according to the nature of the data. For comparison, we also adjusted Bayesian models to estimating the sensitivity and the specifi city of the cervical cancer screening tests here presented based on other link functions, as the log-log complementary function, but we do not observe signifi cant changes in the parameter estimates and its inferences (results not shown). However, a misspecifi ed model could arise from an incorrect link function and the use of model comparison measures, as the Deviance Information Criterion (DIC) of Spiegelhalter et al. [36], allow us to decide which of these functions give us the most appropriate model. An advantage of the logit link function over other functions is that it provides estimates of odds ratios, a meaningful and well-known measure of association.
An important consideration in the use of the proposed model is its dependence on the prior information. In a sensitivity analysis, we noted, for example, that the prevalence of cervical lesions is increased when we used all non-informative prior distributions and other substantial changes in the sensitivity and specifi city. In the presented model, the lack of a gold standard is counterbalanced by the introduction of a latent variable G (2) that best describe the data simulating a reference test. This latent variable has a Bernoulli distribution with success probability given in function of the  performance screening measures and its subjectiveness from the respective prior distributions. Therefore, more accurate results would be given if we incorporate reasonable prior distributions based on prior knowledgement of clinical experts. Although the proposed model is able to estimate useful performance measures of serial and parallel combinations of the screening tests (see Tables 4 and 5), the relative gains in sensitivity and losses in specifi city can be misleading (see details in Franco e Ferenczy [37]). Macaskill et al. [38] argued that the expected number of additional true positive and false positive results (or true negative and false negative results) can be used as the basis for deciding whether to use tests in combination when neither the combined nor a component test shows superior test performance based on their likelihood ratios. Thus, the comparison between the likelihood ratios for two competing tests can be used to assess the incremental gain from an adjunct test. An extension of the presented Bayesian model considering the inclusion of parameters that describe the comparison between likelihood ratios as proposed by Macaskill et al. [38] should be also considered in future studies.
The major shortcoming of the Bayesian estimating method resides in the necessary presumption that the diagnostic tests are statistically and conditionally independent. This presupposition might not be invariably true [39] and alternative methods were proposed by Espeland and Handelman (1989); Yang and Becker (1997) and Dendukuri and Joseph (2001) to address those situations [40,41,13]. However, all of these approaches address situations in which the correlation between two screening tests is considered and extensions for three or more tests are not found in the literature. Bayesian models that include the conditional dependence between multiple screening tests should be considered in future studies.
It is also important to point out that the diagnostic tests evaluated in this study have some inherent fl aws. The lack of accuracy and reproducibility of cervical cytology is explained by the biological variability, sample quality, subjective interpretation of morphological abnormalities and examiners fatigue derived from repetitive procedures [42]. VIA requires training, although an obvious trend towards examiner subjectivity is always present. HC II results are more reproducible than those of VIA and cervical cytology.
Thus, VIA suggested by Belinson et al. (2001) as a screening method is likely to assume a central role in the prevention of cervical cancer in many countries. This simple and inexpensive method does not require complex technical supplies and it allows diagnosis and treatment at a single visit [25]. Coste et al. (2003) evaluated the performances of conventional cytology, liquid-based cytology and HC II in detecting cervical lesions with a sample of 1,757 women, with a combination of colposcopy and biopsy as the gold standard [43]. This impressive number of colposcopies and biopsies certainly reduced the verifi cation bias, but our results still substantiate the view that this type of statistical modelling could provide reliable results using fewer patients either subjected to the gold standard or simply with no gold standard. This is because of the natural infl uence of the disease prevalence on the values of PPV and NPV. The prevalence of histologically confi rmed cervical abnormalities is necessarily low in a healthy population and screening tests usually have low PPV and high NPV, because true positives are rare and true negatives are abundant [30]. Three Indian studies in the late 1990s provided evidence supporting VIA as a viable alternative to cytology as a primary screening test [44][45][46][47]. In one of these studies, Londhe et al. [45] evaluated 12,372 women that underwent VIA, Pap smear and colposcopy in a gynecology outpatient clinic. VIA identifi ed 78 percent of high-grade cervical lesions diagnosed with colposcopy, 3.5 percent more than were identifi ed by cytology. In a 1998 Indian study [46] involving 3,000 women, VIA and cytology (done only by cytotechnicians) performed very similarly (sensitivity ratio of 1.05) in terms of detecting moderate/severe dysplasia. The approximate specifi city of VIA in this study was 92.2 percent compared with 91.3 percent for cytology. In a third Indian study published in 1999, Sankaranarayan et al. found that VIA detected signifi cantly more moderate/severe lesions than cytology but its specifi city was signifi cantly lower [47]. A large-scale study (over 10,000 women) in Zimbabwe compared the performance of VIA and the Pap smear in the hands of nurse midwives in primary health clinics. Phase II of this study was the fi rst to provide direct estimates of sensitivity/ specifi city because all women testing negative or positive were offered the reference standard (colposcopy and biopsy, if indicated). In this study, the sensitivity of VIA (for high-grade positivity) was 1.75 times higher than that of cytology (76.7 vs. 44.3 percent respectively) whereas the specifi city was 1.4 times lower [26].
The studies mentioned above yielded valuable information regarding the performances of VIA and cytology. Our Bayesian estimates provided performance values that can be compared with the results obtained in the standard manner, that is, with the use of a gold standard. In our estimates, VIA and cytology had similar sensitivity and specifi city, as well as PPV and NPV. These fi gures contradict the previously published direct estimates that reported a superiority of VIA in detecting cervical lesions [46,47]. These differences may be attributable to methodological incompatibilities in sample collection or processing and, as mentioned before, to the inherent diff iculties derived from the variability of cytology and VIA interpretation.
The performance of HPV test in screening settings has been extensively studied, but to a lesser extent in comparison with VIA and cytology. Denny et al. (2000) published their data on 2,944 women subjected to VIA, cytology and HPV testing. In this study, VIA and HPV (Ͼ1 RLU) were similar to cytology in their performance of detecting high-grade lesions, but VIA yielded the largest number of false-positives among the three testing modalities [48]. More recently, these same authors [49] published a methodologically similar study testing cervicography, VIA, HPV test and cytology. In this study, 2,754 previously unscreened South African women were subjected to the four exams and VIA detected signifi cantly more highgrade lesions as compared to the other screening tests.
The shortcomings of the tests (as reproducibility, subjective interpretation of results and required training of professional) did not hamper the Bayesian estimates but, in contrast, would enhance the search for realistic estimating equations. Thus, the inclusion of covariates (or control variables) should be encouraged in studies designed for the evaluation of estimating methodologies. According to Parmigiani (2002), prediction models used to support the clinical and health policy decision making need to consider the course of the disease over an extended period of time and draw evidence from a broad knowledge base, including epidemiological cohort and case control studies, randomized clinical trials, expert opinions and more [50]. In these cases, Bayesian decision theory and the tools typically used to describe the uncertainties involved could be extremely useful. The age of the patient is an important covariate in the study of cervical carcinoma precursor lesions. Koss [51] and Schiffman et al. [27,44] did not recommend the use of HPV testing in young women because the prevalence of the virus is exceptionally high in this group and the majority of such infections will spontaneously regress in the short term. Cervical lesions only develop in the presence of persistent HPV infections, thus HPV testing in young women will reveal an excessive number of HPV positive subjects that will never develop HPV-related cervical precancer lesions.
We can conclude that the estimated performances of VIA, HC II and cytology clearly show that the Bayesian method is a remarkable tool for validating diagnostic tests when a gold standard is available for a very limited number of cases or not available at all.